Finding Multivariate Splits in Decision Trees Using Function Optimization

نویسنده

George H. John

چکیده

We present a new method for top-down induction of decision trees (TDIDT) with multivariate binary splits at the nodes. The primary contribution of this work is a new splitting criterion called soft entropy, which is continuous and differentiable with respect to the parameters of the splitting function. Using simple gradient descent to find multivariate splits and a novel pruning technique, our TDIDT-SEH (Soft Entropy Hyperplanes) algorithm is able to learn very small trees with better accuracy than competing learning algorithms on most datasets examined. The process of finding a splitting function at a node of a decision tree is a search problem, and we choose to view it as unconstrained parametric function optimization over the space of hyperplane weight vectors w E Rn. Our objective function is soft entropy, a new continuous approximation to the entropy measure (Quinlan 1986). Soft entropy was chosen for two reasons. First, it is well-established that entropy is a good splitting criterion (Buntine & Niblett 1992). Second, softness is important to get good generalization in continuous spaces, as shown in Figure 1. Related work is similar overall, but the OCl algorithm of Murthy et al. (1993) uses entropy as a criterion, and Brodley and Utgoff (1992) d escribe algorithms using error, also a hard splitting criterion. 8 i”

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Incremental Method for Finding Multivariate Splits for Decision Trees1

Decision trees that are limited to testing a single variable at a node are potentially much larger than trees that allow testing multiple variables at a node. This limitation reduces the ability to express concepts succinctly, which renders many classes of concepts difficult or impossible to express. This paper presents the PT2 algorithm, which searches for a multivariate split at each node. Be...

متن کامل

An Incremental Method for Finding Multivariate Splits for Decision Trees

متن کامل

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search to find good univariate split functions. In contrast, our method computes a linear combination of the features at each node, and optimizes the parameters of ...

متن کامل

Efficient Non-greedy Optimization of Decision Trees

Decision trees and randomized forests are widely used in computer vision and machine learning. Standard algorithms for decision tree induction optimize the split functions one node at a time according to some splitting criteria. This greedy procedure often leads to suboptimal trees. In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with ...

متن کامل

Extreme Multi Class Classification

We consider the multi class classification problem under the setting where the number of labels is very large and hence it is very desirable to efficiently achieve train and test running times which are logarithmic in the label complexity. Additionally the labels are feature dependent in our setting. We propose a reduction of this problem to a set of binary regression problems organized in a tr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

Finding Multivariate Splits in Decision Trees Using Function Optimization

نویسنده

چکیده

منابع مشابه

An Incremental Method for Finding Multivariate Splits for Decision Trees1

An Incremental Method for Finding Multivariate Splits for Decision Trees

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

Efficient Non-greedy Optimization of Decision Trees

Extreme Multi Class Classification

عنوان ژورنال:

اشتراک گذاری